What is claimed is: 
11. A processor comprising: 

2 an apparatus to rotate registers in software pipelined loops; and 

3 a register rotation prediction unit to predict register addresses for future loop 

4 iterations. 

1 2. The processor of claim 1 further including a buffer to hold buffered 

2 instructions with predicted register addresses. 

1 3. The processor of claim 2 further including unarchitected predicate registers to 

2 predicate the buffered instructions. 

1 4. The processor of claim 2 wherein the predicted register addresses are such 

2 that the buffered instructions can be issued simultaneously with a branch instruction. 

1 5. The processor of claim 1 further including a hint register to encode prediction 

2 hints for the register rotation prediction unit. 

1 6. The processor of claim 5 wherein the hint register is configured to hold static 

2 hints generated by a compiler. 

1 7. The processor of claim 5 wherein the hint register is configured to hold 

2 dynamic hints generated at runtime. 

1 8. The processor of claim 5 wherein the hint register includes a field to specify 

2 an iteration distance. 

1 9. The processor of claim 1 further comprising a plurality of unarchitected 

2 frame marker registers. 
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1 10. The processor of claim 9 wherein the register rotation prediction unit 

2 comprises speculation decision making hardware to compute values for the plurality 

3 of unarchitected frame marker registers. 

1 11. The processor of claim 1 0 further comprising register renaming hardware in a 

2 pipeline, the register renaming hardware being responsive to the plurality of 

3 unarchitected frame marker registers. 

1 12. The processor of claim 1 further comprising a trace cache. 

1 13. The processor of claim 12 wherein the trace is configured to hold a prediction 

2 hint for each trace. 

1 14. The processor of claim 13 further comprising a trace cache fill unit to apply 

2 register rotation prediction to traces as traces are constructed. 

1 15. A processing system comprising: 

2 an execution pipeline; 

3 cache memory coupled to the execution pipeline to hold processor 

4 instructions arranged in a software loop; and 

5 register rotation prediction hardware to predict physical register values for the 

6 processor instructions in future iterations of the software loop. 

1 16. The processing system of claim 1 5 further comprising: 

2 a software pipeline instruction buffer coupled between the execution pipeline 

3 and the register rotation prediction hardware to hold the processor instructions in 

4 future iterations of the software loop. 
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1 17. The processing system of claim 16 further comprising: 

2 at least one unarchitected frame marker register coupled to the register 

3 rotation prediction hardware to hold predicted register offsets for future iterations. 

1 18. The processing system of claim 1 7 wherein the execution pipeline includes 

2 register renaming logic responsive to the at least one unarchitected frame marker 

3 register. 

1 19. The processing system of claim 1 6 wherein the register rotation prediction 

2 hardware includes a circuit to specify complete physical register addresses for the 

3 processor instructions in future iterations of the software loop. 

1 20. The processing system of claim 19 wherein processor instructions held in the 

2 software pipeline instruction buffer include fully specified physical register 

3 addresses. 

1 21. The processing system of claim 1 6 wherein the execution pipeline is 

2 configured to speculatively execute instructions received from the software pipeline 

3 instruction buffer. 

1 22. The processing system of claim 21 further comprising a plurality of 

2 unarchitected predicate registers, wherein the instructions within the software 

3 pipeline instruction buffer are predicated on at least one of the plurality of 

4 unarchitected predicate registers. 

1 23. A method of executing a software pipelined loop comprising: 

2 rotating registers for each iteration of the loop; and 

3 predicting register rotations for future iterations of the loop. 
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24. The method of claim 23 wherein the software pipelined loop comprises at 
least one branch instruction, the method further comprising issuing at least one non- 
branch instruction simultaneously with the at least one branch instruction. 



1 25. The method of claim 24 wherein the at least one non-branch instruction is 

2 predicated on an unarchitected predicate register. 

1 26. The method of claim 24 further comprising speculatively removing stop bits 

2 from the at least one branch instruction. 

1 27. The method of claim 26 further comprising speculatively executing the at 

2 least one non-branch instruction. 

1 28. The method of claim 23 wherein predicting comprises: 

2 responsive to a hint register, predicting register rotations for more than one 

3 iteration in the future; and 

4 modifying at least one unarchitected frame marker register. 

1 29. The method of claim 28 further comprising: 

2 speculatively executing instructions for the more than one iteration in the 

3 future; and 

4 squashing the speculative execution if a data dependence is violated. 

1 30. The method of claim 29 further comprising modifying the hint register when 

2 speculative execution is squashed. 
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